11 research outputs found

    Near-Optimal Straggler Mitigation for Distributed Gradient Methods

    Full text link
    Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial gradients based on their partial and local data sets, and send the results to a master node where all the computations are aggregated into a full gradient and the learning model is updated. However, a major performance bottleneck that arises is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. We propose a distributed computing scheme, called Batched Coupon's Collector (BCC) to alleviate the effect of stragglers in gradient methods. We prove that our BCC scheme is robust to a near optimal number of random stragglers. We also empirically demonstrate that our proposed BCC scheme reduces the run-time by up to 85.4% over Amazon EC2 clusters when compared with other straggler mitigation strategies. We also generalize the proposed BCC scheme to minimize the completion time when implementing gradient descent-based algorithms over heterogeneous worker nodes

    Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

    Get PDF
    We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×13.43\times, and also achieves a 2.36×2.36\times-12.65×12.65\times speedup over the state-of-the-art straggler mitigation strategies

    Lagrange Coded Computing: Optimal Design for Resiliency, Security, and Privacy

    Get PDF
    We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×, and also achieves a 2.36×-12.65× speedup over the state-of-the-art straggler mitigation strategies

    Mapping disparities in education across low- and middle-income countries

    Get PDF
    Analyses of the proportions of individuals who have completed key levels of schooling across all low- and middle-income countries from 2000 to 2017 reveal inequalities across countries as well as within populations. Educational attainment is an important social determinant of maternal, newborn, and child health(1-3). As a tool for promoting gender equity, it has gained increasing traction in popular media, international aid strategies, and global agenda-setting(4-6). The global health agenda is increasingly focused on evidence of precision public health, which illustrates the subnational distribution of disease and illness(7,8); however, an agenda focused on future equity must integrate comparable evidence on the distribution of social determinants of health(9-11). Here we expand on the available precision SDG evidence by estimating the subnational distribution of educational attainment, including the proportions of individuals who have completed key levels of schooling, across all low- and middle-income countries from 2000 to 2017. Previous analyses have focused on geographical disparities in average attainment across Africa or for specific countries, but-to our knowledge-no analysis has examined the subnational proportions of individuals who completed specific levels of education across all low- and middle-income countries(12-14). By geolocating subnational data for more than 184 million person-years across 528 data sources, we precisely identify inequalities across geography as well as within populations.Peer reviewe

    Burden of disease scenarios for 204 countries and territories, 2022–2050: a forecasting analysis for the Global Burden of Disease Study 2021

    Get PDF
    Background: Future trends in disease burden and drivers of health are of great interest to policy makers and the public at large. This information can be used for policy and long-term health investment, planning, and prioritisation. We have expanded and improved upon previous forecasts produced as part of the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) and provide a reference forecast (the most likely future), and alternative scenarios assessing disease burden trajectories if selected sets of risk factors were eliminated from current levels by 2050. Methods: Using forecasts of major drivers of health such as the Socio-demographic Index (SDI; a composite measure of lag-distributed income per capita, mean years of education, and total fertility under 25 years of age) and the full set of risk factor exposures captured by GBD, we provide cause-specific forecasts of mortality, years of life lost (YLLs), years lived with disability (YLDs), and disability-adjusted life-years (DALYs) by age and sex from 2022 to 2050 for 204 countries and territories, 21 GBD regions, seven super-regions, and the world. All analyses were done at the cause-specific level so that only risk factors deemed causal by the GBD comparative risk assessment influenced future trajectories of mortality for each disease. Cause-specific mortality was modelled using mixed-effects models with SDI and time as the main covariates, and the combined impact of causal risk factors as an offset in the model. At the all-cause mortality level, we captured unexplained variation by modelling residuals with an autoregressive integrated moving average model with drift attenuation. These all-cause forecasts constrained the cause-specific forecasts at successively deeper levels of the GBD cause hierarchy using cascading mortality models, thus ensuring a robust estimate of cause-specific mortality. For non-fatal measures (eg, low back pain), incidence and prevalence were forecasted from mixed-effects models with SDI as the main covariate, and YLDs were computed from the resulting prevalence forecasts and average disability weights from GBD. Alternative future scenarios were constructed by replacing appropriate reference trajectories for risk factors with hypothetical trajectories of gradual elimination of risk factor exposure from current levels to 2050. The scenarios were constructed from various sets of risk factors: environmental risks (Safer Environment scenario), risks associated with communicable, maternal, neonatal, and nutritional diseases (CMNNs; Improved Childhood Nutrition and Vaccination scenario), risks associated with major non-communicable diseases (NCDs; Improved Behavioural and Metabolic Risks scenario), and the combined effects of these three scenarios. Using the Shared Socioeconomic Pathways climate scenarios SSP2-4.5 as reference and SSP1-1.9 as an optimistic alternative in the Safer Environment scenario, we accounted for climate change impact on health by using the most recent Intergovernmental Panel on Climate Change temperature forecasts and published trajectories of ambient air pollution for the same two scenarios. Life expectancy and healthy life expectancy were computed using standard methods. The forecasting framework includes computing the age-sex-specific future population for each location and separately for each scenario. 95% uncertainty intervals (UIs) for each individual future estimate were derived from the 2·5th and 97·5th percentiles of distributions generated from propagating 500 draws through the multistage computational pipeline. Findings: In the reference scenario forecast, global and super-regional life expectancy increased from 2022 to 2050, but improvement was at a slower pace than in the three decades preceding the COVID-19 pandemic (beginning in 2020). Gains in future life expectancy were forecasted to be greatest in super-regions with comparatively low life expectancies (such as sub-Saharan Africa) compared with super-regions with higher life expectancies (such as the high-income super-region), leading to a trend towards convergence in life expectancy across locations between now and 2050. At the super-region level, forecasted healthy life expectancy patterns were similar to those of life expectancies. Forecasts for the reference scenario found that health will improve in the coming decades, with all-cause age-standardised DALY rates decreasing in every GBD super-region. The total DALY burden measured in counts, however, will increase in every super-region, largely a function of population ageing and growth. We also forecasted that both DALY counts and age-standardised DALY rates will continue to shift from CMNNs to NCDs, with the most pronounced shifts occurring in sub-Saharan Africa (60·1% [95% UI 56·8–63·1] of DALYs were from CMNNs in 2022 compared with 35·8% [31·0–45·0] in 2050) and south Asia (31·7% [29·2–34·1] to 15·5% [13·7–17·5]). This shift is reflected in the leading global causes of DALYs, with the top four causes in 2050 being ischaemic heart disease, stroke, diabetes, and chronic obstructive pulmonary disease, compared with 2022, with ischaemic heart disease, neonatal disorders, stroke, and lower respiratory infections at the top. The global proportion of DALYs due to YLDs likewise increased from 33·8% (27·4–40·3) to 41·1% (33·9–48·1) from 2022 to 2050, demonstrating an important shift in overall disease burden towards morbidity and away from premature death. The largest shift of this kind was forecasted for sub-Saharan Africa, from 20·1% (15·6–25·3) of DALYs due to YLDs in 2022 to 35·6% (26·5–43·0) in 2050. In the assessment of alternative future scenarios, the combined effects of the scenarios (Safer Environment, Improved Childhood Nutrition and Vaccination, and Improved Behavioural and Metabolic Risks scenarios) demonstrated an important decrease in the global burden of DALYs in 2050 of 15·4% (13·5–17·5) compared with the reference scenario, with decreases across super-regions ranging from 10·4% (9·7–11·3) in the high-income super-region to 23·9% (20·7–27·3) in north Africa and the Middle East. The Safer Environment scenario had its largest decrease in sub-Saharan Africa (5·2% [3·5–6·8]), the Improved Behavioural and Metabolic Risks scenario in north Africa and the Middle East (23·2% [20·2–26·5]), and the Improved Nutrition and Vaccination scenario in sub-Saharan Africa (2·0% [–0·6 to 3·6]). Interpretation: Globally, life expectancy and age-standardised disease burden were forecasted to improve between 2022 and 2050, with the majority of the burden continuing to shift from CMNNs to NCDs. That said, continued progress on reducing the CMNN disease burden will be dependent on maintaining investment in and policy emphasis on CMNN disease prevention and treatment. Mostly due to growth and ageing of populations, the number of deaths and DALYs due to all causes combined will generally increase. By constructing alternative future scenarios wherein certain risk exposures are eliminated by 2050, we have shown that opportunities exist to substantially improve health outcomes in the future through concerted efforts to prevent exposure to well established risk factors and to expand access to key health interventions

    Mapping routine measles vaccination in low- and middle-income countries

    No full text

    Mapping subnational HIV mortality in six Latin American countries with incomplete vital registration systems

    No full text
    corecore